Update RNA-Seq Coverage.ipynb#2
Conversation
just changed a period to make it function
|
Thanks! does this actually run now? |
|
Nope..spoke too soon on that... intersect isn't working as advertised. To be fair I'm not using the arabidopsis file but a subsampled version of my own data(mouse), and just checking coverage at Gapdh
but with the as described method of using intersect
Just in case it was something silly I've also checked chrom = "6", chrom = 6, chrom = "Chr6", chrom = '6' |
|
Hmm - does the test arabidopsis data give the expected result? TBH, I don't deal with this type of data myself, so I won't be much help running it down. @CiaranOMara has all the commits on XAM.jl - can they help? |
|
I'll check whether I'm able to reproduce the results of the Arabidopsis thaliana data, then we may delve in from there. |
|
The tutorial is certainly out of date. @aleighbrown, I think you'd want to use the I've included my rough workflow below. # Download Reference genome: Arabidopsis thaliana (assembly TAIR10.1) https://www.ncbi.nlm.nih.gov/genome/?term=txid3702[orgn].
path_genome = download("ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.4_TAIR10.1/GCF_000001735.4_TAIR10.1_genomic.fna.gz", joinpath(@__DIR__, "GCF_000001735.4_TAIR10.1_genomic.fna.gz"))
# Unzip genome.
run(`gunzip $path_genome`)
# Download HISAT2 binary (http://ccb.jhu.edu/software/hisat2/index.shtml).
# path_hisat = download("ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.1.0-Linux_x86_64.zip")
path_hisat = download("ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.1.0-OSX_x86_64.zip")
# path_hisat = download("http://www.di.fc.ul.pt/~afalcao/hisat2.1/hisat2.1_Windows.zip")
# Unzip HISAT2.
run(`unzip $path_hisat -d $(@__DIR__)`)
# Build index.
run(`hisat2-2.1.0/hisat2-build GCF_000001735.4_TAIR10.1_genomic.fna GCF_000001735.4_TAIR10.1_genomic.fna`)
# Download SRR1238088.
run(`fasterq-dump SRR1238088`)
# Align.
run(`hisat2-2.1.0/hisat2 -x GCF_000001735.4_TAIR10.1_genomic.fna -1 SRR1238088_1.fastq -2 ./SRR1238088_2.fastq -S SRR1238088.sam`)
# Convert SAM to BAM.
run(`samtools view -Sb SRR1238088.sam -o SRR1238088.bam`)
# Sort BAM.
run(`samtools sort SRR1238088.bam -o SRR1238088.sort.bam`)
# Index BAM.
run(`samtools index SRR1238088.sort.bam`)
# The important bit.
using BioAlignments #v1.0.0
# AT3G11820
chrom = "NC_003074.8" # Chr3 (see https://www.ncbi.nlm.nih.gov/genome/?term=txid3702[orgn])
leftpos = 3729305
rightpos = 3731116
bamfile = "SRR1238088.sort.bam"
open(BAM.Reader, bamfile, index=bamfile * ".bai") do reader
count(_ -> true, eachoverlap(reader, chrom, leftpos:rightpos))
end
read(pipeline(`samtools view SRR1238088.sort.bam $chrom:$leftpos-$rightpos`, `wc -l`), String) |> stripIn terms of the "Chr6" stuff, you can get the names from the reader. open(BAM.Reader, bamfile, index=bamfile * ".bai") do reader
reader.refseqnames
end |
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "using Bio.Align" |
|
Closed because I changed the default branch to Probably now could use XAM.jl, but I would prefer to skip using other deps (like |
just changed a period to make it function properly. Just putting it out here to give it a nudge for updates.